Only ancestral, streptomycin and control treatments are included, unless otherwise stated.
Authors
Niina Smolander
Shane Hogle
Libraries and functions
Prior to this analysis, the variants have been filtered using Mutect2’s FilterMutectCalls as well as separately for SNPs and INDELs using the following hard filtering factors: SOR, FS, MQ, MQRankSum, ReadPosRankSum, SNP/INDEL clustering (3/35) and SNP/INDEL proximity.
Keeping only the variants where allelic depth (AD) ≥ 5, coverage (DP) ≥ 5 and VAF ≥ 0.05. Also variants that are in mobile genetic elements are removed.
Looking into how many mutations are found in how many replicates in different HAMBIs at the timepoint 3. Fixed variants (VAF = 1) seen in the timepoint 0 are excluded (but not replicate by replicate). Using https://slhogle.github.io/hambiDoubleEvo/R/metagenome/02_parallelism.html as reference.
Distribution of nucleotide multiplicity (the number of replicates with the same genomic mutation) for HAMBI x history x treatment combinations. The observed data in blue, the null expectation in red.
Code
nuc_survival <- variants_filt %>% dplyr::select(HAMBI, History, Experiment, POS, REF, ALT, Replicate) %>%# multiplicity = number of replicate populations each unique mutation is# observed insummarize(m =n_distinct(Replicate), .by =c(HAMBI, History, Experiment, POS, REF, ALT)) %>%# now calculate the total number of mutations across all replicates so we need# to ungroup by mutation position/alt allele but because we still want to# determine this value by treatment category we keep the treatment category# grouping. However, this should be changed if you want to for example average# over all the treatment conditions on a species basisgroup_by(HAMBI, History, Experiment) %>%count(m, name ="m_count") %>%mutate(n = m * m_count,Ntot =sum(n),perc = n / Ntot *100) %>%left_join(chrom_lengths, by =join_by(HAMBI)) %>%arrange(cur_group_id(), desc(m)) %>%# dpois() tells the probability mass at a given number of counts. Here we# want to get the probability of observing n mutations with multiplicity# = mi (i.e. the counts of mi in the observed data). We assume that# mutations independently occur on the genome of size Ltot at a rate of# lambda = Ntot/Ltot and that generally the events are rare. Thus this# situation can be modeled by the Poisson distribution. We can get the# binned number of mutations per level of multiplicity m by multiplying# the probability by the length of the genome and the binned mutations# divided by the total number of mutations.mutate(m_count_expected =cumsum((m_count / Ntot) * sum_bases *dpois(m, lambda = Ntot / sum_bases))) %>%#dplyr::select(-num_contigs) %>%relocate(m, n, Ntot, perc, m_count, m_count_expected) %>%ungroup()# setup for plottingnuc_survival_plot <- nuc_survival %>%group_by(HAMBI, History, Experiment) %>%# when there is only one multiplicity observed for a mutation filter such# that the multiplicty of that mutation must be greater than one.# Otherwise include all remaining mutations (m > 0)filter(case_when(n() ==1~ m >1,TRUE~ m >0)) %>%pivot_longer(cols =c("m_count", "m_count_expected")) %>%mutate(label =paste(HAMBI, "History:", History, "\nExperiment:", Experiment)) %>%# and make final plotplot_nuc_survival(., 5000, c(1, 10, 100, 1000, 10000), 4)nuc_survival_plot
Barplot of the mutation distributions
Split by history and experiment.
Code
HAMBI_mutations_filt2 %>%ggplot(aes(x = mi, y = n, fill = HAMBI)) +geom_bar(stat ="identity") +facet_grid(Experiment ~ History)
Plot of variants that existed in t0 (YSK) samples in at least two replicates. Plot split by experimental condition, NA panel has the variants that were seen in YSK but not in timepoint 3 (t3). Color indicates the difference in variant allele frequency between t3 and t0 (t3 minus t0). Fixed variants (VAF >= 0.9) marked with the text “YSK” / “t3” / “Both”.